LLMD : Large Langage Model Design Ontology
language en

LLMD : Large Langage Model Design Ontology

Release: 2024/10/10

Modified on: 2024/10/10
This version:
0.1
Latest version:
https://edrohal.com/llmd
Issued on:
Date issued
Download serialization:
JSON-LD RDF/XML N-Triples TTL
License:
https://creativecommons.org/publicdomain/zero/1.0/ License
Provenance of this page
Ontology Specification Draft

Abstract

This ontology represents a fragment of the knowledge in the field of Large Language Models (LLMs), focusing on architecture. It provides a framework for comparing LLM designs.

Introduction back to ToC

This ontology offers a simplified structure to understand key architectural variations across LLMs.

LLMD: Overview back to ToC

This ontology has the following classes and properties.

Classes

Object Properties

Data Properties

Annotation Properties

Named Individuals

LLMD: Description back to ToC

This ontology is organized around the architecture class, the module class, and the usesModule property. Any model can use an architecture, and an architecture uses modules. Modules can in turn use other modules, allowing to represent the structure of a model. A model can then use an architecture with the property usesArchitecture. We provide a logical and restrictive description of the transformer architecure and its blocks. The number of parameters of a model is specified at the model level with the property hasParameters and not at the architecture level, which means that two release of one model with different parameter count will have the same architecture. We provide a very inclusive definition of a language task that allows us to consider multi modal LLMs (MLLMs) to be considered language models (and therefore LLMs) : a language task only has to have text or speech input or output.

Cross-reference for LLMD classes, object properties and data properties back to ToC

This section provides details for each class and property defined by LLMD.

Classes

Architecturec back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Architecture

An architecture is the skeleton of a model. It uses some modules, which can in turn use other modules in a nested structure.
has super-classes
has sub-classes
S4 c, Transformer c
is in domain of
uses Module op
is in range of
has Architecture op, is Module Of op
is disjoint with
Organisation c, Data Type c, Module c, Algorithm c

Attention Layerc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#AttentionLayer

A layer implementing attention between query, key and value representations.
has super-classes
Module c
has sub-classes
Causal Attention Layer c, Cross Attention Layer c, Self Attention Layer c
is in domain of
uses Causal Mask dp
is disjoint with
Embedding Layer c, Multi Layer Perceptron c, Normalization Layer c, Transformer Block c

Booleanc back to ToC or Class ToC

IRI: http://schema.org/Boolean

has super-classes
Data Type c

Causal Attention Layerc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#CausalAttentionLayer

A specific type of attention layer where tokens can not attend to tokens coming after them in a sequence.
is equivalent to
Attention Layer c and (uses Causal Mask dp value true)
has super-classes
Attention Layer c
has members
BLOOM Decoder Block Causal Attention Layer ni, GPT Decoder Causal Attention ni, T5 Decoder Causal Attention Layer ni

Corporationc back to ToC or Class ToC

IRI: http://schema.org/Corporation

has super-classes
Organisation c
has members
Google ni, Hugging Face ni, Open A I ni

Cross Attention Layerc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#CrossAttentionLayer

An attention layer that performs attention between two different sequences (usually, the encoded source sequence and the generated target sequence)
has super-classes
Attention Layer c
has members
T5 Decoder Cross-Attention Layer ni

Data Typec back to ToC or Class ToC

IRI: https://edrohal.com/llmd#DataType

A class representing the possible types of data that can be inputs to a model, or more generally an algorithm.
has sub-classes
Boolean c, Image c, Speech c, Text c
is in range of
has Input Type op, has Output Type op
is disjoint with
Organisation c, Architecture c, Module c, Algorithm c

Deep Learning Modelc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#DeepLearningModel

A deep learning model uses some modules (equivalently, it has an architecture), and adjusts some parameters through a training task.
is equivalent to
(has Architecture op some Architecture c) and (has Training Task op some Training Task c) and (has Parameters dp some )
has super-classes
Machine Learning Model c
has sub-classes
Large Language Model c
is in domain of
has Architecture op

Embedding Layerc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#EmbeddingLayer

A layer that implements a lookup table, associating values from a finite domain (like a vocabulary or a bounded set of integers corresponding to positions) to embeddings.
has super-classes
Module c
has sub-classes
Position Embedding Layer c, Token Embedding Layer c
is in domain of
is Transpose Layer op
is in range of
is Transpose Layer op
is disjoint with
Attention Layer c, Normalization Layer c, Transformer Block c

Imagec back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Image

has super-classes
Data Type c

Language Modelc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#LanguageModel

A model that performs some task related to language. It can be multimodal.
is equivalent to
performs Task op some Language Processing Task c
has super-classes
Model c
is in domain of
uses Tokenizer op

Language Processing Taskc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#LanguageProcessingTask

A language processing task is a task related to language in the form of text or speech. It can either have text or speech in its outputs or text or speech in its inputs.
is equivalent to
(has Input Type op value Speech ni) or (has Input Type op value Text ni) or (has Output Type op value Speech ni) or (has Output Type op value Text ni)
has super-classes
Task c
has sub-classes
Language Seq2 Seq Task c

Language Processing Training Taskc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#LanguageProcessingTrainingTask

A training task that is also a language processing task.
is equivalent to
Language Processing Task c and Training Task c
has super-classes
has members
Masked Language Modeling ni, Next Word Prediction ni

Language Seq2 Seq Taskc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#LanguageSeq2SeqTask

A language sequence to sequence task relates some language input sequence to some language output sequence.
is equivalent to
(has Input Type op exactly 1 ) and (has Output Type op exactly 1 )
has super-classes
Language Processing Task c
has members
Text Summarization ni, Text Translation ni

Large Language Modelc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#LargeLanguageModel

A language model that is considered large (i.e it has more than 100000000 parameters)
is equivalent to
Deep Learning Model c and Language Model c and (has Parameters dp some )
has super-classes
Deep Learning Model c
has members
BLOOM Model ni, GPT2 Model ni, T5.11b.model ni

Machine Learning Modelc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#MachineLearningModel

A model that can be used to do predictions after having been fitted to some data. Examples : Linear models, support-vector machines, large language models...
has super-classes
Model c
has sub-classes
Deep Learning Model c
is in domain of
has Training Task op

Mambac back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Mamba

A S4 architecture that uses the selection mecanism to parametrize the SSM parameters through linear projection of the input.
has super-classes
S4 c

Modelc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Model

has super-classes
Algorithm c
has sub-classes
Language Model c, Machine Learning Model c
is in domain of
Published In dp, has Parameters dp, performs Task op, published By op
is in range of
has Published op

Modulec back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Module

Any (parametrisable or not) block that can be put together with other modules to create another module or an architecture.
has sub-classes
Attention Layer c, Embedding Layer c, Multi Layer Perceptron c, Normalization Layer c, Transformer Block c
is in domain of
is Module Of op, uses Module op
is in range of
is Module Of op, uses Module op
is disjoint with
Task c, Organisation c, Architecture c, Data Type c, Algorithm c

Multi Layer Perceptronc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#MultiLayerPerceptron

A module consisting of the association of several linear layers together with activation functions (which can be the identity function).
is equivalent to
Single Layer Perceptron c or ((uses Module op only Single Layer Perceptron c) and (uses Module op min 1 Single Layer Perceptron c))
has super-classes
Module c
has sub-classes
Single Layer Perceptron c
has members
Bert Encoder Block MLP ni, GPT Decoder MLP ni, T5 Decoder MLP ni
is disjoint with
Attention Layer c, Normalization Layer c, Transformer Block c

Normalization Layerc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#NormalizationLayer

A module that performs renormalization of the input with respect to some measure.
has super-classes
Module c
has members
BERT Encoder Normalization Layer ni, Bloom Embedding Layer Normalization ni, GPT Decoder Normalization Layer ni, T5 Decoder Normalization Layer ni
is disjoint with
Embedding Layer c, Attention Layer c, Multi Layer Perceptron c, Transformer Block c

Position Embedding Layerc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#PositionEmbeddingLayer

An embedding that depends only on the position of a certain token in a sequence.
has super-classes
Embedding Layer c
has members
BLOOM Decoder Block ALIBI LAYER ni, GPT2 Absolute Position Embedding Layer ni, T5 Relative Position Embedding ni

Research Organisationc back to ToC or Class ToC

IRI: http://schema.org/ResearchOrganisation

has super-classes
Organisation c
is in domain of
funded By op
has members
Google AI ni

S4c back to ToC or Class ToC

IRI: https://edrohal.com/llmd#S4

Structured State Space for Sequence Modeling (S4) architecture, modeling dependencies inside sequences through the use of SSM layers.
has super-classes
Architecture c
has sub-classes
Mamba c

Self Attention Layerc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#SelfAttentionLayer

An attention layer that perform attention using query, key, and values from the projections of the same sequence.
has super-classes
Attention Layer c
has members
Bert Encoder Attention Layer ni

Single Layer Perceptronc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#SingleLayerPerceptron

A single linear layer.
has super-classes
Multi Layer Perceptron c
is in domain of
is Transpose Layer op
is in range of
is Transpose Layer op
has members
BERT Encoder Block MLP Layer 1 ni, BERT Encoder Block MLP Layer 2 ni, BLOOM Desembedding layer ni, GPT Decoder MLP Layer 1 ni, GPT Decoder MLP Layer 1 ni, GPT Desembedding Layer ni, T5 Decoder MLP Layer 1 ni, T5 Decoder MLP Layer 2 ni, T5 Desembedding Layer ni

Speechc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Speech

has super-classes
Data Type c
has members
Speech ni
is also defined as
named individual

Supervised Training Taskc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#SupervisedTrainingTask

A supervised training task consists in the association of a given input to a desired output.
has super-classes
Training Task c
is disjoint with
Unsupervised Training Task c

Taskc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Task

A task consists of well defined objectif relating some input(s) to some output(s).
is equivalent to
(has Input Type op some Data Type c) and (has Output Type op some Data Type c)
has sub-classes
Language Processing Task c, Training Task c
is in domain of
has Input Type op, has Output Type op
is in range of
performs Task op
is disjoint with
Module c

Textc back to ToC or Class ToC

IRI: http://schema.org/Text

has super-classes
Data Type c
has members
Text ni

Token Embedding Layerc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TokenEmbeddingLayer

An embedding layer that associates token inside a vocabulary to embeddings.
has super-classes
Embedding Layer c
has members
BLOOM Embedding Layer ni, GPT Embedding Layer ni, T5 Embedding Layer ni

Tokenizerc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Tokenizer

An algorithm that builds a vocabulary of tokens using a corpus. Tokenizers also exist for non textual data.
has super-classes
Algorithm c
is in range of
uses Tokenizer op
has members
Byte Pair Encoding Tokenizer ni, Byte Pair Encoding With Space Tokenizer ni, Sentence Piece ni

Training Taskc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TrainingTask

A task used to train a model through the optimisation of a given objective function.
has super-classes
Task c
has sub-classes
Supervised Training Task c, Unsupervised Training Task c
is in range of
has Training Task op

Transformerc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#Transformer

The architecture introduced in (Vaswani 2017), using multi-head attention on input tokens to model sequence depency, and stacking transformer blocks.
is equivalent to
Transformer Decoder Only c or Transformer Encoder Decoder c or Transformer Encoder Only c
has super-classes
Architecture c
has sub-classes
Transformer Decoder Only c, Transformer Encoder Decoder c, Transformer Encoder Only c

Transformer Blockc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TransformerBlock

A block used in the transformer architecture, using attention, layer normalization, and a multi layer perceptron.
is equivalent to
Transformer Decoder Block c or Transformer Encoder Block c
has super-classes
Module c
has sub-classes
Transformer Decoder Block c, Transformer Encoder Block c
is disjoint with
Embedding Layer c, Attention Layer c, Multi Layer Perceptron c, Normalization Layer c

Transformer Decoder Blockc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TransformerDecoderBlock

A transformer decoder block has a causal attention layer and a multi layer perceptron layer, and some normalization layer(s). it can also have a cross attention layer when part of an encoder decoder architecture.
is equivalent to
(uses Module op some Normalization Layer c) and (uses Module op only Causal Attention Layer c or Cross Attention Layer c or Multi Layer Perceptron c or Normalization Layer c) and (uses Module op exactly 1 Causal Attention Layer c) and (uses Module op exactly 1 Multi Layer Perceptron c) and (uses Module op max 1 Cross Attention Layer c)
has super-classes
Transformer Block c
has members
BLOOM Decoder Block ni, GPT Decoder Block ni, T5 Decoder Block ni

Transformer Decoder Onlyc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TransformerDecoderOnly

A transformer using only a stack of decoder blocks.
is equivalent to
Transformer c and (uses Module op some Transformer Decoder Block c) and (uses Module op only Embedding Layer c or Multi Layer Perceptron c or Normalization Layer c or Position Embedding Layer c or Transformer Decoder Block c)
has super-classes
Transformer c
has members
BLOOM Architecture ni, GPT2 Architecture ni
is disjoint with
Transformer Encoder Decoder c, Transformer Encoder Only c

Transformer Encoder Blockc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TransformerEncoderBlock

A transformer encoder block has a self attention layer and a multi layer perceptron layer, and some normalization layer(s).
is equivalent to
(uses Module op some Normalization Layer c) and (uses Module op exactly 1 Multi Layer Perceptron c) and (uses Module op exactly 1 Self Attention Layer c)
has super-classes
Transformer Block c
has members
BERT Encoder Block ni

Transformer Encoder Decoderc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TransformerEncoderDecoder

A transformer using an encoder stack and a decoder stack.
is equivalent to
Transformer c and ((uses Module op some Transformer Decoder Block c) and (uses Module op some Transformer Encoder Block c)) and (uses Module op only Embedding Layer c or Multi Layer Perceptron c or Normalization Layer c or Position Embedding Layer c or Transformer Decoder Block c or Transformer Encoder Block c)
has super-classes
Transformer c
has members
T5 ni
is disjoint with
Transformer Decoder Only c, Transformer Encoder Only c

Transformer Encoder Onlyc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#TransformerEncoderOnly

A transformer using only an encoder stack.
is equivalent to
Transformer c and (uses Module op some Transformer Encoder Block c) and (uses Module op only Embedding Layer c or Multi Layer Perceptron c or Normalization Layer c or Position Embedding Layer c or Transformer Encoder Block c)
has super-classes
Transformer c
is disjoint with
Transformer Decoder Only c, Transformer Encoder Decoder c

Unsupervised Training Taskc back to ToC or Class ToC

IRI: https://edrohal.com/llmd#UnsupervisedTrainingTask

An unsupervised training task uses unlabeled data to define input output pairs a model should associate.
is equivalent to
Training Task c and (not (Supervised Training Task c))
has super-classes
Training Task c
has members
Masked Language Modeling ni, Next Word Prediction ni
is disjoint with
Supervised Training Task c

Object Properties

funded Byop back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#fundedBy

A research organisation is usually funded by some other organizations, which can be a corporations, but also governments or even other research organizations.
has domain
Research Organisation c
has range
Organisation c

has Architectureop back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#hasArchitecture

A deep learning model has an architecture which corresponds to a set of modules that are linked together.

has characteristics: functional

has domain
Deep Learning Model c
has range
Architecture c

has Input Typeop back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#hasInputType

The input type of a task.
has domain
Task c
has range
Data Type c

has Output Typeop back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#hasOutputType

The output type of a task.
has domain
Task c
has range
Data Type c

has Publishedop back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#hasPublished

has domain
Organisation c
has range
Model c
is inverse of
published By op

has Training Taskop back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#hasTrainingTask

has domain
Machine Learning Model c
has range
Training Task c

is Module Ofop back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#isModuleOf

has characteristics: asymmetric, irreflexive

has domain
Module c
has range
Architecture c or Module c
is inverse of
uses Module op

is Transpose Layerop back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#isTransposeLayer

Some linear layers have weights tied in such a way that the matrix representing the transformation associated with one is the transpose of the matrix associated to the other. This property is extended to embedding layers when they are seen as linear layers that send a one hot vector to the embedding of the associated item.

has characteristics: symmetric

has domain
Embedding Layer c or Single Layer Perceptron c
has range
Embedding Layer c or Single Layer Perceptron c

performs Taskop back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#performsTask

A model performs a task when it is able to reliably associate the input of the task to the output of the task.
has domain
Model c
has range
Task c

published Byop back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#publishedBy

A model is usually published by some organization.

has characteristics: functional

has domain
Model c
has range
Organisation c
is inverse of
has Published op

uses Moduleop back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#usesModule

An architecture or module can use a module to perform some task. This property corresponds to a hierarchy in the architecture and is therefore antisymmetric.It is related to the functional structure of the architecture and not directly to connexions in a neural network. This is why it is not permitted to say that a module uses itself in order to model a recurrent neural network: The property is irreflexive.

has characteristics: asymmetric, irreflexive

has domain
Architecture c or Module c
has range
Module c
is inverse of
is Module Of op

uses Tokenizerop back to ToC or Object Property ToC

IRI: https://edrohal.com/llmd#usesTokenizer

Most language model need tokenization of the text input. (Not all : the Mamba architecture is able to operate at the byte level)

has characteristics: functional

has domain
Language Model c
has range
Tokenizer c

Data Properties

has Parametersdp back to ToC or Data Property ToC

IRI: https://edrohal.com/llmd#hasParameters

has domain
Model c
has range
int

Published Indp back to ToC or Data Property ToC

IRI: https://edrohal.com/llmd#PublishedIn

has domain
Model c
has range
date Time

uses Causal Maskdp back to ToC or Data Property ToC

IRI: https://edrohal.com/llmd#usesCausalMask

has characteristics: functional

has domain
Attention Layer c
has range
boolean

Annotation Properties

is Rule Enabledap back to ToC or Annotation Property ToC

IRI: http://swrl.stanford.edu/ontologies/3.3/swrla.owl#isRuleEnabled

Named Individuals

Bert Encoder Attention Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BERT_ENCODER_ATTENTION_LAYER

The multi head self attention layer found in bert style encoder blocks.
belongs to
Self Attention Layer c

BERT Encoder Blockni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BERT_ENCODER_BLOCK

The classical encoder block as implemented in BERT, and later other encoders like T5.
belongs to
Transformer Encoder Block c
has facts
uses Module op Bert Encoder Attention Layer ni
uses Module op Bert Encoder Block MLP ni
uses Module op BERT Encoder Normalization Layer ni

Bert Encoder Block MLPni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BERT_ENCODER_MLP

The two layer perceptron used at the end of BERT style encoder blocks using ReLU activation.
belongs to
Multi Layer Perceptron c
has facts
uses Module op BERT Encoder Block MLP Layer 1 ni
uses Module op BERT Encoder Block MLP Layer 2 ni

BERT Encoder Block MLP Layer 1ni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BERT_ENCODER_MLP_LAYER_1

The first linear layer of the two layered perceptron at the end of the Bert style encoder block.
belongs to
Single Layer Perceptron c

BERT Encoder Block MLP Layer 2ni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BERT_ENCODER_MLP_LAYER_2

The second linear layer of the two layered perceptron at the end of the Bert style encoder block.
belongs to
Single Layer Perceptron c

BERT Encoder Normalization Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BERT_ENCODER_NORMALIZATION_LAYER

The layer normalization layer used in the BERT encoder block.
belongs to
Normalization Layer c

BLOOM Architectureni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM

The Bloom architecture, a decoder only transformer which uses a layer normalization after the embedding layer and with a language modeling head (desembedding layer) which has weights tied with the embedding layer.
belongs to
Transformer Decoder Only c
has facts
uses Module op BLOOM Decoder Block ni
uses Module op BLOOM Desembedding layer ni
uses Module op BLOOM Embedding Layer ni
uses Module op Bloom Embedding Layer Normalization ni

BLOOM Decoder Blockni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_DECODER_BLOCK

A block of the decoder stack in the BLOOM architecture. It uses the same architecture as GPT1, except for the additional ALIBI module inside the attention layer.
belongs to
Transformer Decoder Block c
has facts
uses Module op BLOOM Decoder Block Causal Attention Layer ni
uses Module op GPT Decoder MLP ni
uses Module op GPT Decoder Normalization Layer ni

BLOOM Decoder Block ALIBI LAYERni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_DECODER_BLOCK_ALIBI_LAYER

The ALIBI module used inside the BLOOM decoder block to tune the attention scores depending on relative positions of tokens.
belongs to
Position Embedding Layer c

BLOOM Decoder Block Causal Attention Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_DECODER_BLOCK_CAUSAL_ATTENTION_LAYER

The causal attention layer used inside the BLOOM decoder block.
belongs to
Causal Attention Layer c
has facts
uses Module op BLOOM Decoder Block ALIBI LAYER ni
uses Causal Mask dp "true"^^boolean

BLOOM Desembedding layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_DESEMBEDDING_LAYER

BLOOM's top of the stack linear layer that outputs logits corresponding to token distribution; its weights are tied with the embedding layer.
belongs to
Single Layer Perceptron c
has facts
is Transpose Layer op BLOOM Embedding Layer ni

BLOOM Embedding Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_EMBEDDING_LAYER

The embedding layer at the start of the BLOOM, associating tokens to vectors.
belongs to
Token Embedding Layer c
has facts
is Transpose Layer op BLOOM Desembedding layer ni

Bloom Embedding Layer Normalizationni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_EMBEDDING_LAYER_NORM

The layer normalization behind the embedding layer of BLOOM.
belongs to
Normalization Layer c

BLOOM Modelni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BLOOM_MODEL

The BLOOM model trained and implemented by the big science initiative and published by HuggingFace. It is a multilingual LLM that was "trained in complete transparency" according to HuggingFace.
belongs to
Large Language Model c
has facts
has Architecture op BLOOM Architecture ni
has Training Task op Multi Task Fine Tunning ni
has Training Task op Next Word Prediction ni
performs Task op Next Word Prediction ni
published By op Hugging Face ni
uses Tokenizer op Byte Pair Encoding With Space Tokenizer ni
Published In dp "2022-07-06T00:00:00"^^date Time
has Parameters dp "176000000000"^^int

Byte Pair Encoding Tokenizerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BytePairEncodingTokenizer

The Byte Pair Encoding (BPE) algorithm creates a token vocabulary by greedily merging tokens, starting from characters.
belongs to
Tokenizer c

Byte Pair Encoding With Space Tokenizerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#BytePairEncodingWithSpaceTokenizer

A specific implementation of the Byte Pair Encoding Algorithm that allow spaces to be part of tokens.
belongs to
Tokenizer c

Googleni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#Google

The Google Corporation.
belongs to
Corporation c

Google AIni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#Google_AI

The Google AI division of Google, focusing on research on the topic of AI.
belongs to
Research Organisation c
has facts
funded By op Google ni

GPT Decoder Blockni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DECODER_BLOCK

The transformer decoder block used in the GPT architecture, with only one attention layer and no cross attention layer since GPT is a decoder only transformer.
belongs to
Transformer Decoder Block c
has facts
uses Module op GPT Decoder Causal Attention ni
uses Module op GPT Decoder MLP ni
uses Module op GPT Decoder Normalization Layer ni

GPT Decoder Causal Attentionni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DECODER_CAUSAL_ATTENTION

The causal attention layer used in gpt decoder block.
belongs to
Causal Attention Layer c
has facts
uses Causal Mask dp "true"^^boolean

GPT Decoder MLPni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DECODER_MLP

The two layered perceptron used at the end of the decoder block of GPT.
belongs to
Multi Layer Perceptron c
has facts
uses Module op GPT Decoder MLP Layer 1 ni
uses Module op GPT Decoder MLP Layer 1 ni

GPT Decoder MLP Layer 1ni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DECODER_MLP_LAYER_1

The first layer of the two layer perceptron in the GPT decoder block.
belongs to
Single Layer Perceptron c

GPT Decoder MLP Layer 1ni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DECODER_MLP_LAYER_2

The second layer of the two layer perceptron in the GPT decoder block.
belongs to
Single Layer Perceptron c

GPT Decoder Normalization Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DECODER_NORMALIZATION_LAYER

The normalization used inside the GPT decoder block.
belongs to
Normalization Layer c

GPT Desembedding Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_DESEMBEDDING_LAYER

GPT architecture top of the stack linear layer that outputs logits corresponding to token distribution; its weights are tied with the embedding layer.
belongs to
Single Layer Perceptron c
has facts
is Transpose Layer op GPT Embedding Layer ni

GPT Embedding Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_EMBEDDING_LAYER

The token embedding layer at the start of the GPT transformer architecture.
belongs to
Token Embedding Layer c
has facts
is Transpose Layer op GPT Desembedding Layer ni

GPT2 Absolute Position Embedding Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT_ABSOLUTE_POSITION_EMBEDDING_LAYER

The absolute position embedding used in the GPT1 and GPT2 architecture use a parametrised module that learn embeddings of the input position.
belongs to
Position Embedding Layer c

GPT2 Architectureni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT2

The architecture of GPT2, which follows closely the GPT 1 architecture.
belongs to
Transformer Decoder Only c
has facts
uses Module op GPT2 Absolute Position Embedding Layer ni
uses Module op GPT Decoder Block ni
uses Module op GPT Desembedding Layer ni
uses Module op GPT Embedding Layer ni

GPT2 Modelni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#GPT2_MODEL

The GPT2 model was trained by OpenAI using a similar approach as GPT1, scaling up by a tenth factor both parameters and training data. It also uses a different initialisation scheme for training.
belongs to
Large Language Model c
has facts
has Architecture op GPT2 Architecture ni
has Training Task op Next Word Prediction ni
performs Task op Next Word Prediction ni
published By op Open A I ni
uses Tokenizer op Byte Pair Encoding Tokenizer ni
Published In dp "2019-11-05T00:00:00"^^date Time
has Parameters dp "1500000000"^^int

Hugging Faceni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#HuggingFace

The HuggingFace American / French company, known for its extensive transformer library with the implementation of most deep learning models.
belongs to
Corporation c

Masked Language Modelingni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#MaskedLanguageModeling

A unsupervised training task in which some input text tokens are masked and must be predicted from the context tokens.
belongs to
Language Processing Training Task c
Unsupervised Training Task c
has facts
has Input Type op Text ni

Multi Task Fine Tunningni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#MultiTaskFineTunning

A supervised training task consisting of learning in parallel several classical NLP tasks like translation, or finding the referent of a pronoun.
belongs to
has facts
has Input Type op Text ni
has Output Type op Text ni

Next Word Predictionni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#NextWordPrediction

The unsupervised task consisting in the prediction of the next word in a text sentence. It was popularized as a pretraining objective for transformers with GPT1 in the paper "Improving Language Understanding by Generative Pre-Training"(2018).
belongs to
Language Processing Training Task c
Unsupervised Training Task c
has facts
has Input Type op Text ni
has Output Type op Text ni

Open A Ini back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#OpenAI

The American semi non profit organization that trained GPT models.
belongs to
Corporation c

Sentence Pieceni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#SentencePiece

The sentence piece algorithm, which tokenize text treated as sequence of unicode characters, making tokenization reversible contrarily to other methods like byte pair encoding.
belongs to
Tokenizer c

Speechni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#Speech

belongs to
Speech c
is also defined as
class

T5ni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5

The architecture of the T5 encoder decoder LLM published by Google.
belongs to
Transformer Encoder Decoder c
has facts
uses Module op BERT Encoder Block ni
uses Module op T5 Decoder Block ni
uses Module op T5 Desembedding Layer ni
uses Module op T5 Embedding Layer ni

T5 Decoder Blockni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_BLOCK

The decoder block used in T5 : it is a classical transformer decoder block, the main difference being the presence of a cross attention layer.
belongs to
Transformer Decoder Block c
has facts
uses Module op T5 Decoder Causal Attention Layer ni
uses Module op T5 Decoder Cross-Attention Layer ni
uses Module op T5 Decoder MLP ni
uses Module op T5 Decoder Normalization Layer ni

T5 Decoder Causal Attention Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_CAUSAL_ATTENTION_LAYER

The first attention layer used in the T5 decoder block, performing multi head attention on the input tokens with a causal attention mask.
belongs to
Causal Attention Layer c
has facts
uses Module op T5 Relative Position Embedding ni
uses Causal Mask dp "true"^^boolean

T5 Decoder Cross-Attention Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_CROSSATTENTION_LAYER

The second attention layer in the T5 decoder block, performing cross attention with respect to the latent encoder representations.
belongs to
Cross Attention Layer c

T5 Decoder MLPni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_MLP

The two layer perceptron used at the end of the T5 style decoder block using ReLU activation.
belongs to
Multi Layer Perceptron c
has facts
uses Module op T5 Decoder MLP Layer 1 ni
uses Module op T5 Decoder MLP Layer 2 ni

T5 Decoder MLP Layer 1ni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_MLP_LAYER_1

The first linear layer of the two layered perceptron at the end of the T5 style encoder block.
belongs to
Single Layer Perceptron c

T5 Decoder MLP Layer 2ni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_MLP_LAYER_2

The first linear layer of the two layered perceptron at the end of the T5 style decoder block.
belongs to
Single Layer Perceptron c

T5 Decoder Normalization Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DECODER_NORMALIZATION_LAYER

The layer normalization used in the T5 decoder block.
belongs to
Normalization Layer c

T5 Desembedding Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_DESEMBEDDING_LAYER

The linear layer performing projection onto vocabulary tokens at the end of the T5 architecture. In the T5 architecture, it shares its weights with the embedding layer .
belongs to
Single Layer Perceptron c
has facts
is Transpose Layer op T5 Embedding Layer ni

T5 Embedding Layerni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_EMBEDDING_LAYER

The embedding layer converting vocabulary tokens to embeddings at the beginning of the T5 architecture.
belongs to
Token Embedding Layer c
has facts
is Transpose Layer op T5 Desembedding Layer ni

T5 Relative Position Embeddingni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5_RELATIVE_POSITION_EMBEDDING

A position embedding technic used in T5 that modifies the logits used inside the attention layer, adding a learned scalar that depends on the position offset between query and key.
belongs to
Position Embedding Layer c

T5.11b.modelni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#T5.11b.model

The 11B parameters version of the T5 model published by Google, trained on masked language modeling and a variety of supervised tasks.
belongs to
Large Language Model c
has facts
has Architecture op T5 ni
has Training Task op Masked Language Modeling ni
performs Task op Next Word Prediction ni
performs Task op Text Summarization ni
performs Task op Text Translation ni
published By op Google AI ni
uses Tokenizer op Sentence Piece ni
Published In dp "2019-10-23T00:00:00"^^date Time
has Parameters dp "11000000000"^^int

Textni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#Text

The data type of text contents.
belongs to
Text c

Text Summarizationni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#TextSummarization

A task consisting in summarization of text.
belongs to
Language Seq2 Seq Task c
has facts
has Input Type op Text ni
has Output Type op Text ni

Text Translationni back to ToC or Named Individual ToC

IRI: https://edrohal.com/llmd#TextTranslation

A Task consisting of the association of a text input representing an information to a text output representing the same information in a different language or formalism. Example : French to German is a translation task.
belongs to
Language Seq2 Seq Task c
has facts
has Input Type op Text ni
has Output Type op Text ni

Legend back to ToC

c: Classes
op: Object Properties
dp: Data Properties
ni: Named Individuals

References back to ToC

Add your references here. It is recommended to have them as a list.

Acknowledgments back to ToC

The authors would like to thank Silvio Peroni for developing LODE, a Live OWL Documentation Environment, which is used for representing the Cross Referencing Section of this document and Daniel Garijo for developing Widoco, the program used to create the template used in this documentation.